Pattern Matching in Huffman Encoded Texts

نویسندگان

  • Shmuel Tomi Klein
  • Dana Shapira
چکیده

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Processing Text Files as Is: Pattern Matching over Compressed Texts, Multi-byte Character Texts, and Semi-structured Texts

Techniques in processing text files “as is” are presented, in which given text files are processed without modification. The compressed pattern matching problem, first defined by Amir and Benson (1992), is a good example of the “as-is” principle. Another example is string matching over multi-byte character texts, which is a significant problem common to oriental languages such as Japanese, Kore...

متن کامل

Speeding Up String Pattern Matching by Text Compression: The Dawn of a New Era

This paper describes our recent studies on string pattern matching in compressed texts mainly from practical viewpoints. The aim is to speed up the string pattern matching task, in comparison with an ordinary search over the original texts. We have successfully developed (1) an AC type algorithm for searching in Huffman encoded files, and (2) a KMP type algorithm and (3) a BM type algorithm for...

متن کامل

Adapting Boyer-Moore-Like Algorithms for Searching Huffman Encoded Texts

In this paper we propose an efficient approach to the compressed string matching problem on Huffman encoded texts, based on the Boyer-Moore strategy. Once a candidate valid shift has been located, a subsequent verification phase checks whether the shift is codeword aligned by taking advantage of the skeleton tree data structure. Our approach leads to algorithms that exhibit a sublinear behavior...

متن کامل

More Speed and More Compression: Accelerating Pattern Matching by Text Compression

This paper addresses the problem of speeding up string matching by text compression, and presents a compressed pattern matching (CPM) algorithm which finds a pattern within a text given as a collage system 〈D,S〉 such that variable sequence S is encoded by byte-oriented Huffman coding. The compression ratio is high compared with existing CPM algorithms addressing the problem, and the search time...

متن کامل

Processing of Huffman Compressed Texts with a Super-Alphabet

We present an efficient algorithm for scanning Huffman compressed texts. The algorithm parses the compressed text in O(n log2 σ b ) time, where n is the size of the compressed text in bytes, σ is the size of the alphabet, and b is a user specified parameter. The method uses a variable size super-alphabet, with an average size of O( b H log2 σ ) symbols, where H is the entropy of the text. Each ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001